Search CORE

73 research outputs found

Tackling Data Bias in Painting Classification with Style Transfer

Author: Li Frederick W.B.
Shum Hubert P.H.
Vijendran Mridula
Publication venue
Publication date: 01/01/2023
Field of study

It is difficult to train classifiers on paintings collections due to model bias from domain gaps and data bias from the uneven distribution of artistic styles. Previous techniques like data distillation, traditional data augmentation and style transfer improve classifier training using task specific training datasets or domain adaptation. We propose a system to handle data bias in small paintings datasets like the Kaokore dataset while simultaneously accounting for domain adaptation in fine-tuning a model trained on real world images. Our system consists of two stages which are style transfer and classification. In the style transfer stage, we generate the stylized training samples per class with uniformly sampled content and style images and train the style transformation network per domain. In the classification stage, we can interpret the effectiveness of the style and content layers at the attention layers when training on the original training dataset and the stylized images. We can tradeoff the model performance and convergence by dynamically varying the proportion of augmented samples in the majority and minority classes. We achieve comparable results to the SOTA with fewer training epochs and a classifier with fewer training parameters

Durham Research Online

Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation

Author: Breckon Toby P.
Li Li
Shum Hubert P.H.
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 28/03/2023
Field of study

Whilst the availability of 3D LiDAR point cloud data has significantly grown in recent years, annotation remains expensive and time-consuming, leading to a demand for semisupervised semantic segmentation methods with application domains such as autonomous driving. Existing work very often employs relatively large segmentation backbone networks to improve segmentation accuracy, at the expense of computational costs. In addition, many use uniform sampling to reduce ground truth data requirements for learning needed, often resulting in sub-optimal performance. To address these issues, we propose a new pipeline that employs a smaller architecture, requiring fewer ground-truth annotations to achieve superior segmentation accuracy compared to contemporary approaches. This is facilitated via a novel Sparse Depthwise Separable Convolution module that significantly reduces the network parameter count while retaining overall task performance. To effectively sub-sample our training data, we propose a new Spatio-Temporal Redundant Frame Downsampling (ST-RFD) method that leverages knowledge of sensor motion within the environment to extract a more diverse subset of training data frame samples. To leverage the use of limited annotated data samples, we further propose a soft pseudo-label method informed by Li- DAR reflectivity. Our method outperforms contemporary semi-supervised work in terms of mIoU, using less labeled data, on the SemanticKITTI (59.5@5%) and ScribbleKITTI (58.1@5%) benchmark datasets, based on a 2.3× reduction in model parameters and 641× fewer multiply-add operations whilst also demonstrating significant performance improvement on limited training data (i.e., Less is More)

arXiv.org e-Print Archive

Durham Research Online

CP-AGCN: Pytorch-based Attention Informed Graph Convolutional Network for Identifying Infants at Risk of Cerebral Palsy

Author: Ho Edmund S.L.
Shum Hubert P.H.
Zhang Haozheng
Publication venue: Elsevier
Publication date: 06/09/2022
Field of study

Early prediction is clinically considered one of the essential parts of cerebral palsy (CP) treatment. We propose to implement a low-cost and interpretable classification system for supporting CP prediction based on General Movement Assessment (GMA). We design a Pytorch-based attention-informed graph convolutional network to early identify infants at risk of CP from skeletal data extracted from RGB videos. We also design a frequencybinning module for learning the CP movements in the frequency domain while filtering noise. Our system only requires consumer-grade RGB videos for training to support interactive-time CP prediction by providing an interpretable CP classification result

arXiv.org e-Print Archive

Durham Research Online

Enlighten

Unifying Human Motion Synthesis and Style Transfer with Denoising Diffusion Probabilistic Models

Author: Chang Ziyi
Findlay Edmund J.C.
Shum Hubert P.H.
Zhang Haozheng
Publication venue
Publication date: 01/01/2023
Field of study

Generating realistic motions for digital humans is a core but challenging part of computer animations and games, as human motions are both diverse in content and rich in styles. While the latest deep learning approaches have made significant advancements in this domain, they mostly consider motion synthesis and style manipulation as two separate problems. This is mainly due to the challenge of learning both motion contents that account for the inter-class behaviour and styles that account for the intra-class behaviour effectively in a common representation. To tackle this challenge, we propose a denoising diffusion probabilistic model solution for styled motion synthesis. As diffusion models have a high capacity brought by the injection of stochasticity, we can represent both inter-class motion content and intra-class style behaviour in the same latent. This results in an integrated, end-to-end trained pipeline that facilitates the generation of optimal motion and exploration of content-style coupled latent space. To achieve high-quality results, we design a multi-task architecture of diffusion model that strategically generates aspects of human motions for local guidance. We also design adversarial and physical regulations for global guidance. We demonstrate superior performance with quantitative and qualitative results and validate the effectiveness of our multi-task architecture

Durham Research Online

A Quadruple Diffusion Convolutional Recurrent Network for Human Motion Prediction

Author: Ho Edmond S.L.
Leung Howard
Men Qianhui
Shum Hubert P.H.
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 16/11/2020
Field of study

Recurrent neural network (RNN) has become popular for human motion prediction thanks to its ability to capture temporal dependencies. However, it has limited capacity in modeling the complex spatial relationship in the human skeletal structure. In this work, we present a novel diffusion convolutional recurrent predictor for spatial and temporal movement forecasting, with multi-step random walks traversing bidirectionally along an adaptive graph to model interdependency among body joints. In the temporal domain, existing methods rely on a single forward predictor with the produced motion deflecting to the drift route, which leads to error accumulations over time. We propose to supplement the forward predictor with a forward discriminator to alleviate such motion drift in the long term under adversarial training. The solution is further enhanced by a backward predictor and a backward discriminator to effectively reduce the error, such that the system can also look into the past to improve the prediction at early frames. The two-way spatial diffusion convolutions and two-way temporal predictors together form a quadruple network. Furthermore, we train our framework by modeling the velocity from observed motion dynamics instead of static poses to predict future movements that effectively reduces the discontinuity problem at early prediction. Our method outperforms the state of the arts on both 3D and 2D datasets, including the Human3.6M, CMU Motion Capture and Penn Action datasets. The results also show that our method correctly predicts both high-dynamic and low-dynamic moving trends with less motion drift

Durham Research Online

Northumbria Research Link

Enlighten

Cerebral Palsy Prediction with Frequency Attention Informed Graph Convolutional Networks

Author: Ho Edmond
Shum Hubert P.H.
Zhang Haozheng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/07/2022
Field of study

Early diagnosis and intervention are clinically con-sidered the paramount part of treating cerebral palsy (CP), so it is essential to design an efficient and interpretable automatic prediction system for CP. We highlight a significant difference between CP infants' frequency of human movement and that of the healthy group, which improves prediction performance. However, the existing deep learning-based methods did not use the frequency information of infants' movement for CP prediction. This paper proposes a frequency attention informed graph convolutional network and validates it on two consumer-grade RGB video datasets, namely MINI-RGBD and RVI-38 datasets. Our proposed frequency attention module aids in improving both classification performance and system interpretability. In addition, we design a frequency-binning method that retains the critical frequency of the human joint position data while filtering the noise. Our prediction performance achieves state-of-the-art research on both datasets. Our work demonstrates the effectiveness of frequency information in supporting the prediction of CP non-intrusively and provides a way for supporting the early diagnosis of CP in the resource-limited regions where the clinical resources are not abundant

Northumbria Research Link

A Two-stream Convolutional Network for Musculoskeletal and Neurological Disorders Prediction

Author: Ho Edmond S.L.
Leung Howard
Men Qianhui
Shum Hubert P.H.
Zhu Manli
Publication venue: Springer
Publication date: 01/01/2022
Field of study

Musculoskeletal and neurological disorders are the most common causes of walking problems among older people, and they often lead to diminished quality of life. Analyzing walking motion data manually requires trained professionals and the evaluations may not always be objective. To facilitate early diagnosis, recent deep learning-based methods have shown promising results for automated analysis, which can discover patterns that have not been found in traditional machine learning methods. We observe that existing work mostly applies deep learning on individual joint features such as the time series of joint positions. Due to the challenge of discovering inter-joint features such as the distance between feet (i.e. the stride width) from generally smaller-scale medical datasets, these methods usually perform sub-optimally. As a result, we propose a solution that explicitly takes both individual joint features and inter-joint features as input, relieving the system from the need of discovering more complicated features from small data. Due to the distinctive nature of the two types of features, we introduce a two-stream framework, with one stream learning from the time series of joint position and the other from the time series of relative joint displacement. We further develop a mid-layer fusion module to combine the discovered patterns in these two streams for diagnosis, which results in a complementary representation of the data for better prediction performance. We validate our system with a benchmark dataset of 3D skeleton motion that involves 45 patients with musculoskeletal and neurological disorders, and achieve a prediction accuracy of 95.56%, outperforming state-of-the-art methods

Durham Research Online

Northumbria Research Link

PubMed Central

Enlighten

Facial reshaping operator for controllable face beautification

Author: Aslam Nauman
Hu Shanfeng
Li Frederick W.B.
Liang Xiaohui
Shum Hubert P.H.
Publication venue: Elsevier
Publication date: 08/10/2020
Field of study

Posting attractive facial photos is part of everyday life in the social media era. Motivated by the demand, we propose a lightweight method to automatically and efficiently beautify the shapes of both portrait and non-portrait faces in photos, while allowing users to customize the beautification of individual facial features. Previous methods focus on the beautification of mostly frontal and neutral faces, without incorporating user controllability in the beautification process. To address these restrictions, we propose the Facial Reshaping Operator representation, which is affine-invariant, captures the pairwise geometric configuration of facial landmarks, and allows for efficient face beautification with the user-specified weights of individual facial parts. We also propose an unsupervised beautification method in the operator space of faces, where an input face is iteratively pulled towards a local nearby density mode with improved attractiveness. Our method distinguishes itself from the commercial beautification tools in that it mildly enhances facial shapes without altering makeups or complexions, which complements these tools that lack fine-grained control on the attractiveness of facial shapes for users. The experimental results show that our method improves facial shape attractiveness for a large range of poses and expressions, demonstrating the potential of applicability to photos seen on the social media such as Facebook and Instagram everyday

Durham Research Online

CP-AGCN: Pytorch-based attention informed graph convolutional network for identifying infants at risk of cerebral palsy

Author: Ho Edmond S.L.
Shum Hubert P.H.
Zhang Haozheng
Publication venue: 'Elsevier BV'
Publication date: 06/09/2022
Field of study

Early prediction is clinically considered one of the essential parts of cerebral palsy (CP) treatment. We propose to implement a low-cost and interpretable classification system for supporting CP prediction based on General Movement Assessment (GMA). We design a Pytorch-based attention-informed graph convolutional network to early identify infants at risk of CP from skeletal data extracted from RGB videos. We also design a frequency-binning module for learning the CP movements in the frequency domain while filtering noise. Our system only requires consumer-grade RGB videos for training to support interactive-time CP prediction by providing an interpretable CP classification result

arXiv.org e-Print Archive

Enlighten

A Pose-based Feature Fusion and Classification Framework for the Early Prediction of Cerebral Palsy in Infants

Author: Embleton Nicholas D.
Ho Edmond S.L.
Hu Pengpeng
Marcroft Claire
McCay Kevin D.
Munteanu Adrian
Shum Hubert P.H.
Woo Wai Lok
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 23/12/2021
Field of study

The early diagnosis of cerebral palsy is an area which has recently seen significant multi-disciplinary research. Diagnostic tools such as the General Movements Assessment (GMA), have produced some very promising results. However, the prospect of automating these processes may improve accessibility of the assessment and also enhance the understanding of movement development of infants. Previous works have established the viability of using pose-based features extracted from RGB video sequences to undertake classification of infant body movements based upon the GMA. In this paper, we propose a series of new and improved features, and a feature fusion pipeline for this classification task. We also introduce the RVI-38 dataset, a series of videos captured as part of routine clinical care. By utilising this challenging dataset we establish the robustness of several motion features for classification, subsequently informing the design of our proposed feature fusion framework based upon the GMA. We evaluate our proposed framework’s classification performance using both the RVI-38 dataset and the publicly available MINI-RGBD dataset. We also implement several other methods from the literature for direct comparison using these two independent datasets. Our experimental results and feature analysis show that our proposed pose-based method performs well across both datasets. The proposed features afford us the opportunity to include finer detail than previous methods, and further model GMA specific body movements. These new features also allow us to take advantage of additional body-part specific information as a means of improving the overall classification performance, whilst retaining GMA relevant, interpretable, and shareable features

Durham Research Online

Northumbria Research Link

E-space: Manchester Metropolitan University's Research Repository

Enlighten